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ABSTRACT 

Device-free (DF) localization in WLANs has been introduced 
as a value-added service that allows tracking indoor entities 
that do not carry any devices. Previous work in DF WLAN 
localization focused on the tracking of a single entity due to 
the intractability of the multi-entity tracking problem whose 
complexity grows exponentially with the number of humans 
being tracked. 

In this paper, we introduce Spot as an accurate and ef- 
ficient system for multi-entity DF detection and tracking. 
Spot is based on a probabilistic energy minimization frame- 
work that combines a conditional random field with a Markov 
model to capture the temporal and spatial relations between 
the entities' poses. A novel cross-calibration technique is in- 
troduced to reduce the calibration overhead of multiple en- 
tities to linear, regardless of the number of humans being 
tracked. This also helps in increasing the system accuracy. 

We design the energy minimization function with the goal 
of being efficiently solved in mind. We show that the de- 
signed function can be mapped to a binary graph-cut prob- 
lem whose solution has a linear complexity on average and a 
third order polynomial in the worst case. We further employ 
clustering on the estimated location candidates as a means 
for reducing outliers and obtaining more accurate tracking in 
the continuous space. Experimental evaluation in two typi- 
cal testbeds, with a side-by- side comparison with the state- 
of-the-art, shows that Spot can achieve a multi-entity track- 
ing accuracy of less than 1.1m. This corresponds to at least 
36% enhancement in median distance error over the state- 
of-the-art DF localization systems, which can only track a 
single entity. In addition. Spot can estimate the number of 
entities correctly to within one difference error. This high- 
lights that Spot achieves its goals of having an accurate and 
efficient software-only DF tracking solution of multiple en- 
tities in indoor environments. 

Keywords 

Binary graph-cut, conditional random fields, device-free 
localization, energy minimization, Markov models, multi- 
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Figure 1: Typical architecture of a DF WLAN 
localization system. 

1. INTRODUCTION 

Device-free (DF) localization [22] is a concept that 
allows the detection and tracking of entities that do not 
carry any devices nor participate actively in the local- 
ization process. DF localization has a number of appli- 
cations including intrusion detection, border protection, 
smart homes, and traffic estimation. 

Different approaches have been proposed for address- 
ing the DF detection and tracking problem that can be 
categorized into two main groups: Those that require 
special hardware and those that leverage the already 
installed wireless infrastructure. 

Radar-based systems, e.g. [20l UK |5], computer vi- 
sion systems, e.g. |13l|8], and radio tomographic imag- 
ing (RTI) JH] provide accurate detection and tracking. 
However, all require the installment of special hardware 
to track a DF entity. On the other hand, systems that 
leverage the currently installed wireless networks, e.g. 
WLAN [221 HH HH El HE], provide a software only so- 
lution for DF localization and have the advantage of 
scalability in terms of cost and coverage area. 

WLAN DF localization systems are based on the con- 
cept [22] that the presence of an entity in an RF envi- 
ronment affects the signal strength, which can be used 
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to detect, track, and identify the entities. Figure [T] 
shows the architecture of a typical WLAN DF local- 
ization system. The system consists of signal transmit- 
ters (e.g. standard APs); signal receivers or monitoring 
points (MPs), such as any WiFi enabled device (e.g. 
laptops and APs themselves); and an application server 
that collects the received signal strength (RSS) for the 
different streams (where a stream is a single (AP, MP) 
pair) readings and processes them to detect events. 

To track entities, and due to the complex relation be- 
tween RSS and distance in indoor environment, a fin- 
gerprint has been traditionally used to capture the RSS 
behavior at different locations in the area of interest. 
To construct the fingerprint, a human stands at dif- 
ferent locations in the area of interest and her effect on 
the RSS of the different streams is recorded at the MPs. 
Constructing the fingerprint for multiple entities though 
requires trying all the combinations of entities over all 
calibration locations, which grows exponentially with 
the number of fingerprint locations. Therefore, current 
effort in WLAN DF localization focuses only on the 
tracking of a single entity. 

In this paper, we introduce Spot as a system for the 
accurate and efficient detection and tracking of mul- 
tiple DF entities in a WLAN environment. Spot is 
based on a probabilistic energy minimization framework 
that combines a conditional random field with a Markov 
model: Given a RSS vector of all the streams in the area 
of interest, the problem of estimating the most proba- 
ble active user locations is mapped to an energy mini- 
mization problem whose potential function is designed 
to preserve smooth and consistent labels for active lo- 
cations relative to their neighbors and their movement 
history. In addition, we show that the designed energy 
function is regular in the sense that it can be mapped 
to a binary graph-cut problem whose solution has a 
linear complexity on average and a cubic polynomial 
in the worst case. Spot also introduces a novel cross- 
calibration technique to reduce the calibration overhead 
of multiple entities to linear in the number of locations, 
as compared to exponential for the current state-of- 
the-art. This also helps in increasing the system accu- 
racy. 

Since a human can affect more than one location in 
the area of interest, we further employ clustering on 
the estimated location candidates as a means for re- 
ducing outliers and obtaining more accurate tracking 
in the continuous space. Each detected cluster repre- 
sents a human whose location in the center of mass of 
fingerprint locations inside the cluster. Experimental 
evaluation in two typical testbeds, with a side-by-side 
comparison with the state-of-the-art, shows that Spot 
can achieve a tracking accuracy of less than 1.1m. This 
corresponds to at least 36% enhancement in median er- 
ror over the state-of-the-art DF localization systems in 
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Figure 2: Spot system architecture. 



the two testbeds, while enabling the tracking of multiple 
entities. In addition, Spot can estimate the number of 
entities with 100% accuracy to within one difference er- 
ror. This accuracy advantage is obtained without scar- 
ifying computational power. 

In summary, the contribution of this paper is four- 
fold: (1) We formulate the multi-entity DfP problem 
as an energy- minimization framework that preserves both 
spatial and temporal smoothness and consistency (Sec- 
tion [2j), (2) We show how to map the problem to a 
binary graph-cut problem and obtain its solution ef- 
ficiently; and present our novel cross-calibration tech- 
nique that reduces the calibration complexity to linear 
in the number of locations, rather than exponential as in 
the current state-of-the-art (Section [3]), (3) We present 
clustering techniques for reducing noise and enhancing 
accuracy (Section |4]), (4) We evaluate the system in two 
typical WiFi testbeds and compare it to the state-of- 
the-art DF WLAN localization techniques (Section [5]). 

We also discuss issues related to the system design 
and present future directions in Section [6l Related work 
and paper conclusions are presented in sections [7] and [8] 
respectively. 

2. THE SPOT SYSTEM 

In this section, we give the details of Spot. We start 
by an overview of the system architecture followed by 
the details of the system modules. 

2.1 Overview 

Figure [2] shows the system architecture. The system 
collects the signal strength readings from the monitor- 
ing points for processing. There are two phases of op- 
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eration: 

1. Offline training phase: to estimate the system pa- 
rameters based on the collected signal strength 
readings and construct the device- free fingerprint. 
During this phase, a human stands at different lo- 
cations in the area of interest and the RSS at each 
MP is recorded. Note that our formulation re- 
quires only one human for calibration in the of- 
fline phase, regardless of the number of humans 
during the system operation (Section 13. 2p . This 
significantly reduces the calibration overhead as 
compared to the state-of-the-art DF systems. 

2. Online tracking phase: to estimate the multi-entities' 
locations based on the received signal strength from 
each stream and the fingerprint prepared in the of- 
fline phase using the energy minimization frame- 
work. 

The Noise Filtering module reduces the noise in the 
RSS readings and filters outlier streams. 

The Energy Minimization Framework calculates the 
probabilities used in the energy minimization frame- 
work, constructs an equivalent graph, and estimates the 
most probable active locations (i.e. environment map) 
based on solving a binary graph-cut problem. 

The Multi-Entity Detection and Estimation module 
uses clustering techniques to estimate the number of 
entities and the location of each entity. A non-zero es- 
timated number of entities is equivalent to a detection 
event in the area of interest. 

2.2 System Model 

Without loss of generality, let X be a 2-dimensional 
physical space. At each location x G X, we can get 
the signal strength from k streams. We denote the k- 
dimensional signal strength space as S. Each element 
in this space is a /c-dimensional vector, s = (si, 
whose entries represent the signal strength readings from 
a different (AP, MP) pair. We further assume that the 
samples from different streams are independent. 

Given that m humans are standing in the area of 
interest, m > 0, these humans will affect the different 
streams. Therefore, the problem becomes: 

Problem 1. We want to both estimate the number 
of humans m and, if m > 0, the locations of these hu- 
mans {xi\0 < i < rh^Xi e X}, such that the probability 
P(xi, X2, Xrnl^) is maximized. 

In Section [H we assume a discrete X space. We dis- 
cuss the continuous space case in Section |H 

2.3 Noise Filtering 

The aim of this module is to preprocess collected RSS 
readings during the offline and online phases to reduce 
the noise effects and detect outliers. We use two tech- 
niques: RSS filtering and stream filtering. 



2.3.1 RSS Filtering 

RSS is a noisy quantity due to the time varying wire- 
less channel [21]. To reduce the noise effect, we apply 
an a -trimmed Mean filter [19] to the measured RSS val- 
ues. An Qf-trimmed filter has the advantage of handling 
both impulse and gaussian noise, as compared to mean 
and median filters that can handle only one of them. 
In addition, it is simple to implement: Given a win- 
dow of q RSS samples, the a-trimmed filter sorts the 
samples (such that RSSi < RSS2 < • • • < RSSq) and 
then discards the a extreme samples and averages the 
remaining samples. The output of the a- trimmed mean 
fflter is: 

^(^'") = ^3^ S ^^^^ 

where < a < 0.5. We set a to 0.2 as it is a rea- 
sonable value for the window size we use in our system 
(Section E]). 

2.3.2 Streams Filter 

Even after smoothing the RSS values, using the alpha- 
trimmed filter, the readings of a single stream may 
have significantly changed between the offline and on- 
line phases due to changes in the environment. To 
detect this change and filter outlier streams, we use 
the Analysis of Variance (ANOVA) to test whether the 
mean of the RSS of a particular stream have signifi- 
cantly changed between the offline and online phase. If 
there is a statistically significant difference, the stream 
is filtered from the current calculations. 

3. ENERGY MINIMIZATION FRAMEWORK 

In this section, we assume a discrete X space with n 
locations. Let {a^, < z < n} be a set of bernoulli ran- 
dom variables, where aj takes the value of 1 if a human 
is standing at location z G X at time t, and otherwise. 
Therefore, the problem of estimating the number of en- 
tities m and their locations, given the received signal 
strength vector s (Problem [T]), is equivalent to finding 
the assignment of a^'s that maximizes 

P{M^\s) (2) 

where = (a^, 0^2, o^^). We refer to as the 
environment map at time t. In this case, m = Xir=i ^1 
and the most probable locations of the m entities are 
the locations whose a-'s are assigned to one. 

Traditional work on probabilistic WLAN localization, 
both device-based and device-free, e.g. [23l|T5], use 
Bayesian inversion to estimate P(M^|s). However, these 
systems typically assume only one entity in the area of 
interest. Moving to more than one entity makes this 
Bayesian inversion approach intractable as the complex- 
ity of estimating P(s|M^) increases exponentially with 
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the number of entities that need to be tracked [T5j (due 
to the need to try all combinations of humans' poses in 
the area of interest). 

Alternatively, we use an energy-minimization frame- 
work that leverages this joint estimation problem of a^'s 
to enhance the accuracy while, at the same time, ob- 
tains an efficient solution. 

In particular, we represent the spatial constraints on 
the human position by a Conditional Random Field 
(CRF) model favoring coherence between adjacent lo- 
cations. The temporal relation between the human lo- 
cations is captured by a second order Hidden Markov 
Model (Figure [3]). Estimation is finally performed by 
mapping the problem to a binary graph-cut problem 
that can be efficiently solved in 0{n) on average and 
O(n^) in the worst case. 

A CRF is an undirected graphical model that defines 
a log-linear distribution over label sequences given a 
particular observation sequence [TO]. It was introduced 
as a framework for labeling and segmenting data that 
models the conditional probability P{Y\X), where X 
and Y are the observations and the labels respectively. 
CRFs have the advantage of relaxing the strong inde- 
pendence assumptions made by Hidden Markov Models 
[17] for a large number of variables (such as those in 
the environment map). In addition, CRFs avoid the 
label bias problem [8], a weakness exhibited by maxi- 
mum entropy Markov models [9] (MEMMs) and other 
conditional Markov models based on directed graphi- 
cal models. Therefore, CRFs outperform both MEMMs 
and HMMs on a number of real- world sequence labeling 
tasks [8, 11, 15]. 

By combining a HMM for temporal relations with a 
CRF for spatial relations, we gain the benefit of both 
worlds in terms of accuracy and efficiency. In the rest of 
this section, we describe the construction of the energy 
minimization framework and how we efficiently solve it. 

3.1 Framework Construction 

Our model extends the model in Equation [2] to cap- 
ture the temporal constraints. In particular, our goal 
becomes to find the environment map at time t, M^, 
that maximizes: 

p(M^ls^M^-^M^-^) (3) 

assuming a second order temporal dependence in the 
Markov model as we discuss in details later. 

Based on CRF theory [9j, our combined model esti- 
mates the probability of Equation [3] as 

P(M^|s^M^-^M^-2) (xexp-{E'} (4) 

where E'^ = £^(5^ M^"\ M^"^) is an energy func- 
tion that captures the required constraints on the DF 
tracking problem. That is, we want to estimate the 
current environments map given the previous two en- 




Figure 3: Combined CRF-HMM model. This 
graphical model illustrates both the signal 
strength likelihood together with the spatial and 
temporal priors. The same temporal chain is 
repeated at each discrete location. Spatial de- 
pendencies are illustrated for a 4-neighborhood 
system. The entire environment map affects the 
RSS vector sK 

vironment maps and the current signal strength vector 
measured at the monitoring points. This is obtained by 
the joint maximization of the posterior in Equation HI 
which is equivalent to the minimization of energy: 

= (ai,a2, ...,a^) = argmin£;' (5) 

Energy Terms: For our DF tracking problem, each 
E'^ is composed of three components: 

E^ = £;(s^M^M^-^M^-2) 

= VT"^(M^M^-^M^-2) + VSp(MSs^) + /7^S(M^s^) 

(6) 

The term y^"^(M^ M^"^M^"^) is a temporal prior 
term that represents a second-order Markov chain that 
imposes a tendency to temporal continuity on the envi- 
ronment map. 

The term V^p(M^,s^) presents a spatial prior term 
which imposes a tendency to spatial continuity of the 
environment map, favoring coherent assignments. 

Finally, the [/^^(M^, s^) term is a likelihood term that 
evaluates the evidence for location labels based on the 
RSS distributions in the case of human absence and 
presence. 

This energy model captures both the signal strength 
likelihood together with the spatial and temporal pri- 
ors. Figure [3] shows the graphical representation of the 
model. Details of these factors are given in the next 
subsections. 
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Figure 4: Temporal transitions at a location (as- 
suming the human affects four locations inside 
the circle), (a) An entity moves towards the 
right from time t — 2 to time t — 1, (b) Between 
the two time instances, locations may remain 
in their own active or inactive states (denoted A 
and A^, respectively) or change state; thus defin- 
ing four different kinds of temporal transitions: 
A ^ A, A ^ A"", A"" ^ A, Those tran- 

sitions influence the label that a location in the 
environment map is going to assume at time t, 

3.1.1 Temporal prior term 

Figure m shows the four different temporal transitions 
a location assignment (label) can undergo in an environ- 
ment map, based on a two time instances analysis. For 
instance, an active location may remain active (loca- 
tions labeled AA in Figure |4]) or move to the inactive 
state (locations labeled AA^) etc. It is important to 
note that a first-order Markov chain is inadequate to 
convey the nature of temporal coherence in this prob- 
lem; a higher order Markov chain is required. For ex- 
ample, a location that was inactive at time t — 2 and is 
active at time t — 1 is far more likely to remain active at 
time t than to go back to the inactive state. A second- 
order Markov chain is used to balance performance and 
complexity. We quantify the effect of the order of the 
chain in Section 15.2.51 

These intuitions are captured probabilistically and in- 
corporated in our energy minimization framework by 
means of a second order Markov chain, as shown in the 
graphical model of Figure [3l The temporal transition 
priors {P{a\\a\~^ ^a\~'^)) are learned during the train- 
ing phase. This leads to the following joint temporal 
prior term: 

n 

^^"(M*, M*-iM*-2) = /? ^ -[log , a*-^)] 

i=l 

(7) 

where /3 < 1 is a discount factor to allow for multiple 
counting across non-independent locations. The opti- 
mal value of /3, as well as the other parameters of the 
CRF, are obtained discriminatively from the training 
data using the iterative scaling algorithm pT] , 

3.1.2 Spatial prior term 

This term should favor coherent environment maps, 
i.e. adjacent locations have similar labels. We adapt a 
variation of the Ising model commonly used for segmen- 



tation applications [2] where the spatial energy term can 
be represented as: 

{ci,cj}eN 

^ / i+e-ii^(-'i<)-^(->;-)ir \ 

(8) 

where N is the set of pairs of neighboring locations. The 
term P{s^\a\.) represents the conditional probability of 
receiving the signal strength vector when the human 
is present at location Ci (a^. = 1) or not present (a^. = 

0) . This can be estimated during the training phase as 
described in Section 13.21 The constant 7 is a strength 
parameter for the coherence prior that can be estimated 
based on the training data. 

3.1.3 Likelihood for signal strength 

The term U^^iM^.s^) is the log likelihood of the re- 
ceived signal strength. The term is defined as : 

n 

C/SS(M*,s*) = ,5^[-logP(s* I «•)] (9) 

where ^ < 1 is a discount factor to allow for multiple 
counting across non-independent locations whose opti- 
mal value is obtained discriminatively from the training 
data. 

RSS likelihoods are learned during the offline training 
phase as described in the next section. 

3.2 Fingerprint Construction 

During the offline phase. Spot needs to estimate both 
the RSS likelihood, P{s^ \ ^\)-> ^"^^ the temporal transi- 
tion priors, P {a\\a\~^ ^ a\~'^) . This is the functionality 
of the Fingerprint Builder Module. 

3.2.1 RSS likelihood 

Based on the described signal strength terms in the 
energy function, i.e. the spatial prior and signal strength 
likelihood, the fingerprint of Spot is unique among all 
the previous device-based and device-free WLAN lo- 
calization systems. Figure [5] shows the difference be- 
tween the fingerprint for a traditional DF system and 
that of Spot. In particular, we use a cross- calibration 
technique, where an entity standing at location x con- 
tributes to the active RSS likelihoods of x {P{s^ \ a^. = 

1) ) and the inactive RSS likehhoods of the all remaining 
n - 1 FP locations {P{s^ | = 0, Vz 7^ x)). This has 
two advantages: (1) It reduces the the coverage spar- 
sity problem in the presence of few streams and (2) it 
converts the intractable exponential number of cases of 
building the fingerprint for traditional DF systems [15] 
to a linear complexity problem, as only one human is 



5 




Silence, 



P(c^=OK',<') 



(a) Traditional FP construction for one entity 



Silence 





(b) Traditional FP construction for mult i- entities 
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(c) FP construction for Spot 

Figure 5: Difference between fingerprint (FP) 
construction for traditional DF systems and 
Spot. Left figure represents an example while 
the figure to the right represents all required 
combinations (FP complexity), (a) FP construc- 
tion for one entity in a traditional DF system: 
One histogram, representing the active state 
RSS, is stored in only one location (where the 
user is standing), (b) FP construction for two 
entities in a traditional DF system: Two hu- 
mans are needed along with trying all their poses 
combinations in the area of interest ((2))- A 
total of 2^ combinations are required to cap- 
ture the fingerprint of all possible number of 
humans and their locations, (c) FP construc- 
tion in Spot: Only one human is needed to con- 
struct the FP regardless of the actual number of 
humans to be tracked (due to the environment 
map formulation). A human standing at one 
location (x) captures the RSS active histogram 
at this location {P{s^ \ a\. = 1)) and affects the 
inactive histograms at all other FP locations, 
{P{s^ I a- = 0,Vz 7^ x)), (cross-calibration). This 
leads to two histograms at every FP location. 

needed for training, regardless of the number of humans 
to be tracked. 

In summary, at each location, we have two histograms 
for the RSS corresponding to the active and inactive 
states respectively using the cross-calibration technique. 
The fingerprint is the collection of these two histograms 
over all locations x G X. We smooth the generated his- 
tograms by convolution with separable gaussian kernels 
to avoid the zero-probability problem of missing values 
in the training set. 




Figure 6: Finite state diagram for the possible 
temporal transitions at any location. The sum 
of arcs originating from any node is one, leading 
to only four degrees of freedom. 

3.2.2 Temporal transition prior 

Although there are eight possible transitions (Fig- 
ure[6]), due to probabilistic normalization {P{(y\ = ol\~'^) 
1 — P{a\ = 0|a^~^, the temporal priors have only 

four degrees of freedom. These temporal transition pri- 
ors are learned from the training data. 

3.3 Most Probable Map Estimation 

In this section, we show how to obtain the optimal 
environment map by solving the energy minimization 
problem in Equation [5] efficiently through mapping it 
to a binary graph-cut problem. We start by a brief 
background on graph-cuts, followed by how to map the 
DF energy minimization problem to a graph problem. 

3.3.1 Binary graph-cuts 

Let ^ = (f , V) be a directed graph with nonnegative 
edge weights that has two special vertices (terminals): 
the source s and the sink t. An s — t-cut (or a binary 
graph-cut) C = {5*, T} is a partition of the vertices of 
V into two disjoint sets S and T such that s £ S and 
t G T. The cost of the cut is the sum of costs of all 
edges that go from S to T: 



c{S,T) 



E 

ues,veT,{v 



c{u^ v) 



,v)es 



The minimum s — t-cut problem is to find a cut C with 
the smallest cost. Ford and Fulkerson [4| proved that 
this is equivalent to computing the maximum flow from 
the source to sink. This problem can be solved in a low 
order polynomial in n [3] Q. This way, a binary graph- 

^Note however that generalizations of the minimum s-t-cut 
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p(c<=iK',<') 

(a) Constructed graph 
with edge weights 



(b) An example cut 



Figure 7: Mapping the DF energy minimization 
problem to a binary graph-cut problem. 



cut can be considered as a binary labeling of the graph 
nodes to be either s or t. 

3.3.2 DF tracking as a binary graph-cut problem 

Not every energy minimization function can be solved 
using a graph-cut approach. According to [6], the fol- 
lowing theorem gives a necessary and sufficient condi- 
tion for a function to be solved using the binary min-cut 
algorithm. 

Theorem 1. Let E he a function of n binary vari- 
ables in the form of 

i i<j 

Then, E is graph-represent able if, and only if, each term 
E^^ satisfies the inequality 

£;^^(o,o) + £;^^(i,i) < £;^^(i,o) + £;^^(o,i). (lo) 

Note that the condition only involves the binary terms, 
i.e. those that involve the relation between two vari- 
ables. This maps only to the spatial consistency term 
in our DF energy function (Equation [8]). 

Corollary 1. The DF energy minimization func- 
tion is graph-represent able. 

Proof. The proof follows directly by mapping the 
terms of Equation [8] to Equation [10] noting that the 
LHS of Equation [To] is zero in the DF tracking problem 
and the two RHS terms are positive. □ 

The above corollary tells us that we can find a poly- 
nomial time efficient solution to the DF energy mini- 
mization problem using the binary graph-cut mapping. 
Figure [7] shows how our energy minimization problem 
can be mapped to a binary graph-cut problem. We con- 
struct a graph that has n + 2 nodes, where n nodes are 

problem to involve more than two terminals are NP-hard. 
We prove in the next subsection that our problem can be 
mapped to a binary graph-cut problem. 



the original discrete environment map locations and two 
additional nodes are added to represent the source and 
sink nodes. There are two types of edges. Those be- 
tween the original discrete environment map locations 
(n-edges) and those between each node and the source 
and sink terminal nodes (t-edges). The edge weights 
are assigned in the following way to guarantee that the 
min-cut solution to this graph is equivalent to minimiz- 
ing the energy function in Equation [5] [6] : 

1. The t-edge between the source and node X IS aS" 
signed a weight of P(5^ I = 0)+P(a^ = 0|a^~-^,a^" 



2. The t-edge between node x and the sink is assigned 
a weight of P(s^|a^ = I) ^ P{al = 

3. The n-edge (x, y) between node x and node y is 

assigned a weight oi 2 • 

Theorem 2. The binary graph- cut solution on the 
constructed graph is a solution to the corresponding en- 
ergy minimization problem in Equation\^ 

Proof. The proof is in the appendix. □ 

Any node connected to the source (sink) node after 
the cut is considered inactive (active). 

3.4 Computational Complexity 

The binary graph-cut algorithm requires O(n^) oper- 
ations, where n is the number of fingerprint locations. 
However, we use the algorithm in [3^ as it provides an 
iterative fast algorithm. Although the algorithm has 
the same complexity in the worst case, its average com- 
plexity is 0(n). This has been confirmed in our exper- 
iments. 

3.5 Discussion 

Using the proposed technique, we could reduce the 
training complexity from 0(2^) to 0(n). This is a 
significant reduction in the calibration overhead which 
turns the multi-entity tracking problem to a feasible 
problem. 

The proposed framework also treats the detection and 
tracking problem in a homogenous manner. In partic- 
ular, detection can be regarded as a special case of the 
system, where a non-zero estimate of the number of en- 
tities is equivalent to a detection event. 

4. MULTI-ENTITY DETECTION AND ES- 
TIMATION 

The output of the binary graph-cut operation is a set 
of candidate locations. However, these locations can- 
not be used directly as a human presence at a loca- 
tion typically affects the signal strength at more than 
one neighboring location (Figure H]) leading to overes- 
timating the actual number of humans and their loca- 
tions. This effect on neighboring locations decreases as 
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Figure 8: An example dendrogram showing the 
hierarchical clustering inconsistency. Dotted 
lines represent split clusters. The figure shows 
three clusters corresponding to three entities in 
the area of interest. 



we move away from the actual human location. There- 
fore, the Multi-entity Detection and Estimation Module 
applies clustering to the output of the binary graph-cut 
algorithm, such that the number of output clusters de- 
termines the number of entities and the center of mass of 
each cluster gives the coordinates of human correspond- 
ing to this cluster. This not only solves the problem of 
overestimating the number of entities, but also in locat- 
ing the entities in the continuous space by the weighted 
averaging of all the samples within a cluster. To fur- 
ther enhance accuracy, we apply clustering to the last 
w environment maps by merging them into one map. 

4.1 Approach 

We used a hierarchical clustering technique (Figure [8|) 
as it gives us an intuitive means to estimate the num- 
ber of clusters. In particular, leaf nodes represent in- 
dividual candidates. Each internal node represents a 
possible cluster. As we go up in the tree, clusters are 
combined to form a bigger cluster using Euclidean dis- 
tance between clusters centers as a similarity measure. 
The root of the tree corresponds to one cluster that con- 
tains the entire set of candidate nodes. Starting from 
the root of the tree, if the degree of inconsistency be- 
tween two clusters is high, based on a parameter r, we 
split them as two separate clusters. This process is re- 
peated recursively for each of the split clusters until the 
degree of inconsistency is below r. The final number 
of clusters represents the estimated number of humans 
and the center of mass of each cluster is the estimated 
human location. 

4.2 Clustering Complexity 

The hierarchical clustering requires O(c^) operations, 
where c is the number of candidate locations. Typically, 
c is << n. Therefore, clustering has a low overhead. We 
quantify this effect in Section [5l 



MP? 




(a) Testbed 1 (b) Testbed 2 

Figure 9: Experimental testbeds. 

Table 1: Default parameters values. 



Parameter 


Default value 


Meaning 


k 


6 


Num. of used streams 


n 


25 


Num. of FP locations 


w 


13 


Window size 





2 


HMM order 


r 


0.25 


Clust. inconsistency thr. 



5. PERFORMANCE EVALUATION 

In this section, we analyze the performance of Spot 
and compare it to a deterministic [16] and a probabilis- 
tic [15] state-of-the-art DF WLAN localization systems. 
We start by describing the experimental setup and data 
collection. Then, we analyze the effect of different pa- 
rameters on the system performance. 

5.1 Testbeds and Data Collection 

We evaluate our system in two different testbeds (Fig- 
ure [9]) . The first testbed covers a residential apartment 
with an area of llAm? (about 1228 sq. ft.) while the 
second testbed represents an office building with an area 
of 130m^ (about 1400 sq. ft.). The two testbeds were 
covered by TP-link TL-WA500G APs and D-Link Air- 
plus G+DWL-650 wireless NICs. 

For data collection, we used a sampling rate of one 
hertz. We had six RSS data streams for both testbeds. 
A total of 25 fingerprint locations, uniformally distributed 
over the testbed, are sampled for both testbeds. An 
independent test set at 17 (22) test locations for the 
first (second) testbed, was collected at different times 
and by different persons. 

We give the details of the results of the first testbed 
and summarize the results of the second. Figure 10 
shows an example of the output of the system. 

5.2 Parameters Effect 

In this section, we study the effect of changing the 
system parameters on the performance of Spot. The 
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average distance error is used as the main metric where 
the error is calculated as the difference between the 
estimated location and the closest ground truth loca- 
tion (for multiple-entities). We present two versions of 
the average distance error relying on the level of details 
needed. If determining the zone the person is standing 
in the main target, then the average distance error is 
calculated based on the centers of estimated and actual 
zones and we call it zones-based difference (Figures 
11-16). On the other hand, if higher level of details is 
required, i.e. the exact person location, we calculate the 
difference based on the coordinates of the ground truth 
and the estimated location and we call it locations- 
based difference (Figures 19-24). 

To calculate the distance error for multiple entities, 
we use the Euclidean distance between the estimated 
zone/location of each entity and the closest fingerprint 
zone/locatiorjl. 

Table [T] shows the default values of the different pa- 
rameters. 

5. 2. 7 Window size (w) 

Figures 11 and 19 show that increasing the window 
size enhances the system accuracy. This is due to lever- 
aging more information. This, however, increases the 
latency of the location estimation. Therefore, an ap- 
plication should balance the latency-accuracy tradeoff 
based on its requirements. 

5.2.2 Clustering inconsistency threshold (r) 

Figures 12 and 20 show that for small values of r, 
i.e. r < 0.15, the system tends to generate one cluster, 
regardless of the number of entities in the area of in- 
terest, underestimating the true number of humans. As 
r approaches its maximum value, i.e. one, the system 
generates a lot of clusters, overestimating the actual 
number of humans. This quantifies the advantage of 
the clustering module. An optimal value for r occurs 
around 0.25. 

5.2.3 Fingerprint density (n) 

Figures 13 and 21 show that increasing the fingerprint 
density increases accuracy. As small as 15 locations, 
corresponding to a density of one FP location every 
7.6m^, is enough to achieve the best accuracy. Increas- 
ing the density beyond this value does not significantly 
enhance the accuracy. 

5.2.4 Number of streams (k) 

Figures 14 and 22 show that increasing the number of 
streams increases the system accuracy, especially for a 
higher number of entities, to a certain limit after which 

^If the estimated number of entities is less than the actual 
number of entities, we use the testbed center as the ground 
truth. 



the performance saturates. As few as four streams can 
achieve less than 1.6 meter overall accuracy for the zone- 
based difference. 

5.2.5 HMM Order (o) 

Figures 15 and 23 show that a second order model 
enhances performance over lower order models. In some 
cases, a third order model performs worse than a second 
order mode due to over-training. This justifies the use 
of a second order HMM. 

5.3 Comparison with Other DF Systems 

5.3.1 Accuracy 

Figures 16 and 24 show the CDF of distance error for 
the different techniques {note that current state-of- 
the-art supports only one entity). Tables [2] and [3] 
summarize the results for the two testbeds. The results 
show that Spot has the best performance under the two 
testbeds with an enhancement of at least 36% in me- 
dian error over the best state-of-the-art techniques for 
zones-based difference and at least 15.49% in average 
error for locations-based difference. All techniques per- 
form better in Testbed 2 due to the closer separation of 
training point in Testbed 2. 

Figure 17 also shows that Spot can estimate the num- 
ber of entities in the area of interest with at most one 
difference error. This can further be enhanced as de- 
scribed in Section [6l 

5.3.2 Running Time 

Figure 18 and Table [2] show the running time for the 
different techniques and Spot components. The results 
show that the overall Spot operations take less 1.9ms 
per location estimate for both testbeds. The clustering 
component consumes the largest time, followed by the 
min-cut algorithm, and finally calculating the probabil- 
ities. 

Although Table [2] shows that all algorithms have the 
same complexity (as c « n), the running time does 
differ. This is due to the proportionality constants for 
the small n and m values in our experiment. Spot takes 
slightly higher running time than the deterministic tech- 
nique (less than 4.75% on average for both testbeds). 
However, it significantly outperforms the probabilistic 
Nuzzer technique, with 65% enhancement on average in 
running time. This highlights that Spot significant gain 
in accuracy and reduction in training overhead comes 
at a negligible increase in running time. 

6. DISCUSSION 

In this section, we discuss different aspects of Spot. 

6.1 Path Training 
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Table 2: Performance summary for the different systems under the two testbeds using the zones- 
based difference as a metric. Number between parenthesis represent percentage of Spot-One entity 
advantage, c is the number of candidate locations after the graph-cut phase in Spot and first phase 
of probabilistic Nuzzer. c is typically << n. 



System 


Testbed 1 


Testbed 2 


Complexity 


Median 
error 


Average 
error 


Running 
time 


Median 
error 


Average 
error 


Running 
time 


Spot-One ent. 


Im 


1.43m 


1.95ms 


1.1m 


1.25m 


1.9ms 


0{n.m + c^) 


Spot-Mult i-ent. 


1.75m 

(75%) 


2.15m 
(50.3%) 


2.56ms 
(31.4%) 


0.85m 
(-22.7%) 


1.72m 
(37.6%) 


2.4ms 
(26.3%) 


Prob. Nuzzer |15| 


2.3m 

(130%) 


2.66m 

(86%) 


3.53ms 
(81.35%) 


1.5m 
(36.4%) 


1.64m 
(31.2%) 


2.85ms 
(49.84%) 


0(n.m + n.c) 


Det. Nuzzer ^6J 


3m 

(270%) 


3.54m 
(147.5%) 


1.78ms 
(-8.4%) 


2.7m 

(145.5%) 


3.12m 
(149.6%) 


1.92ms 
(1.1%) 


0{n.m) 



Using the proposed framework, we could reduce the 
training complexity from 0(2^) to 0{n). This is a 
significant reduction in the calibration overhead which 
turns the multi-entity tracking problem to a feasible 
problem. However, there is still some effort in calibrat- 
ing the area of interest as the user has to stand at each 
location for a certain time. One possibility to reduce 
this overhead is to use path-based training, where a 
user continuously moves between two points and sam- 
ples are collected along the path. This continuous cali- 
bration reduces the overhead, but provides less samples. 
Multiple passes around the area of interest can be used 
to increase the number of available samples along with 
density interpolation between adjacent locations. Fur- 
ther experiments need to be performed though to asses 
the tradeoffs of this technique. 

6.2 Identification 

Although we can track multiple entities in the area 
of interest, identifying these entities remains an open 
problem. This identification includes knowing the enti- 
ties' physical identity (e.g. its name) or virtual identity, 
i.e. associating a unique ID to the detected entity. This 
entity labeling problem is well known in other fields, 
such as computer vision The entities movement 
history and trajectories can be used to detect these vir- 
tual identities. 

6.3 Number of Entities History Model 

Spot can correctly estimate the number of entities 
with high accuracy. This can be further enhanced based 
on adding constraints for the temporal smoothness of 
the number of entities. In other words, outliers in es- 
timating the number of entities can be detected based 
on the history of the detected number of entities. This 
can handle cases such as clusters merging and splitting. 

7. RELATED WORK 

Device-free tracking systems have been introduced 



over the year including: radar-based, camera-based, sensors- 
based, and WLAN-based systems. Table |4] shows how 
Spot compares to the different systems. 

In the radar-based systems, pulses of radio waves are 
transmitted into the area of interest and based on mea- 
suring the received reflections, objects could be tracked. 
Several technologies have been presented in this class 
including ultra- wideband (UWB) systems [20], doppler 
radar p^, and MIMO radar systems 0. 

Camera-based tracking systems are based on analyz- 
ing a set of captured images to estimate the current 
locations of objects of interest [131 IH]- The analysis 
consists of two main processes: background subtraction 
and temporal correspondence. 

Sensor-based systems use especially installed sensor 
nodes to cover the area of interest. For example, [18] 
applies radio tomographic techniques to the readings of 
a dense array of sensors to obtain accurate DF tracking. 

All the technologies above share the requirement of 
installing special hardware to be able to perform DF 
tracking, which reduces their scalability in terms of cost 
and coverage area. In contrast, WLAN DF tracking 
aims at exploiting the already installed WLAN. The 
DF localization in WLANs was first introduced in [22] 
along with feasibility experiments in a controlled envi- 
ronment. Several papers followed the initial vision to 
provide different techniques for detection and tracking 
[HI nil [71 [16]. However, all these techniques focus on 
the problem of a single entity. Tracking multiple en- 
tities, to-date, has been considered an intractable prob- 
lem due to the exponential increase in the number of 
training combination required. 

Spot^ on the contrary, is designed to provide accu- 
rate and efficient, i.e. linear training complexity, multi- 
entity DF localization for WLAN environments. 

8. CONCLUSION 

We presented the design, analysis, and implementa- 
tion of Spot: a system for accurate and efficient multi- 
entity device-free WLAN localization. Spot leverages 
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Fig. 10: A heatmap highlighting the 
system output. Two close entities are 
present on the left and another entity 
is present on the right. 
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Fig. 13: Effect of changing the finger- 
print density (n) on accuracy, (zones- 
based difference) 
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Fig. 16: CDF of distance error for 
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Fig. 11: Effect of changing the window 
size iw) on accuracy, (zones-based dif- 
ference) 
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Fig. 14: Effect of changing the num- 
ber of streams (k) on accuracy, (zones- 
based difference) 
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timation error. A random estimator is 
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on performance. 
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Fig. 12: Effect of changing the clus- 
tering inconsistency threshold (r) on 
accuracy, (zones-based difference) 
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Fig. 15: Effect of changing the HMM 
order (o) on accuracy, (zones-based 
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Fig. 18: Running time for the different 
components of Spot and a comparison 
with other systems running time. 



probabilistic techniques to provide a smooth and con- 
sistent environment image. It uses a cross-calibration 
technique and an energy minimization framework to re- 
duce the calibration overheard to linear in the number 
of locations, which turns the DF multi-entity tracking 
to a tractable problem. We showed that the selected 
energy minimization terms lead to an efficient solution 
by mapping the energy function to a binary graph-cut 
problem. We further showed how to perform clustering 
on the generated environment map to remove outliers 
and enhance accuracy. 

Implementation on standard WiFi hardware in two 
different testbeds show that Spot can achieve 1.1m me- 



dian distance multi-entity tracking error, which is bet- 
ter than the stat-of-art techniques by at least 36% in 
both testbeds for zone-based differences and 15.49% in 
average error for the locations-based difference. In ad- 
dition, it can estimate the number of entities correctly 
to within one entity difference 100% of the time. This 
highlights the promise of Spot for a wide range of multi- 
entity DF tracking applications. 
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Table 4: Comparison of different RF-based DF localization systems. 





MIMO Radar- 


Radio Tomographic 


Nuzzer 


Spot 




based Systems 


Imaging (RTI) 


System 


System 


Special hardware required 


Yes 


Yes 


No 


No 


Number of special nodes 


Few 


Many 


None 


None 


Number of streams 


N/A (echo based) 


Large (756) 


Smah (6) 


Smah (6) 


Coverage area 


Limited (high freq.) 


Limited 


Yes 


Yes 


Computational Complexity 


Low 


High 


Moderate 


Low 


Accuracy 


Very High 


High 


Moderate 


High 


Muli-path effect 
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Multi-entity tracking 
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APPENDIX 

A. PROOF OF THEOREM 2 

The proof is based on showing that the optimization 
problem solved by the min-cut on the constructed graph 
is equivalent to the optimization problem in Equation [6l 

Proof. For any node x in the constructed graph 
(Figure [7|), if this node is assigned the label S, i.e. its 
value is aj, = 1, after running the min-cut algorithm, 
then part of its corresponding contribution in the opti- 
mal cost is P{s^\al = 1) + P{al = l|a^"\a^"^). Sim- 
ilarly, if this node is assigned the label T, i.e. its value 



13 



is = 0, after running the min-cut algorithm, then 
part of its corresponding contribution in the optimal 
cost is P{s^\al = 0) ^P{ai = 0|a^-\a^-2). Both are 
equivalent to the unary terms in Equation [6l 

Now consider any two nodes x and y in the con- 
structed graph (Figure [7]), if these two nodes have the 
same label, no extra terms are contributed to the opti- 
mal cost in the minimal cut. However, if x is assigned to 
S and y is assigned to T or vice versa, an extra term will 
be added to the optimal cost in the minimal cut, which 

_||p(s*|«* )-P(s*|«*)||^ 

is equal to ^ . This corresponds to 

the binary term in Equation [6l 

Therefore the cost of the minimal cut in the con- 
structed graph is equivalent to the minimum of the sum- 
mation of both the unary and binary terms. Therefore 
the binary graph-cut solution on the constructed graph 
is a solution to the corresponding energy minimization 
problem in Equation [5l □ 



